POSIX Regular Expression Parsing with Derivatives
نویسندگان
چکیده
We adapt the POSIX policy to the setting of regular expression parsing. POSIX favors longest left-most parse trees. Compared to other policies such as greedy left-most, the POSIX policy is more intuitive but much harder to implement. Almost all POSIX implementations are buggy as observed by Kuklewicz. We show how to obtain a POSIX algorithm for the general parsing problem based on Brzozowski’s regular expression derivatives. Correctness is fairly straightforward to establish and our benchmark results show that our approach is promising.
منابع مشابه
POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl)
Brzozowski introduced the notion of derivatives for regular expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying disambiguation strategy for regular expressions needed in lexers. Sulzmann and Lu have made available on-line what they call a “rigorous pr...
متن کاملDerivative-Based Diagnosis of Regular Expression Ambiguity
Regular expressions are often ambiguous. We present a novel method based on Brzozowski’s derivatives to aid the user in diagnosing ambiguous regular expressions. We introduce a derivative-based finite state transducer to generate parse trees and minimal counter-examples. The transducer can be easily customized to either follow the POSIX or Greedy disambiguation policy and based on a finite set ...
متن کاملRecognising and Generating Terms using Derivatives of Parsing Expression Grammars
Grammar-based sentence generation has been thoroughly explored for Context-Free Grammars (CFGs), but remains unsolved for recognition-based approaches such as Parsing Expression Grammars (PEGs). Lacking tool support, language designers using PEGs have difficulty predicting the behaviour of their parsers. In this paper, we extend the idea of derivatives, originally formulated for regular express...
متن کاملCertified Derivative-Based Parsing of Regular Expressions
We describe the formalization of a certified algorithm for regular expression parsing based on Brzozowski derivatives, in the dependently typed language Idris. The formalized algorithm produces a proof that an input string matches a given regular expression or a proof that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed w...
متن کاملFIRE/J - optimizing regular expression searches with generative programming
Regular expressions are a powerful tool for analyzing and manipulating text. Their theoretical background lies within automata theory and formal languages. The FIRE/J (Fast Implementation of Regular Expressions for Java) regular expression library is designed to provide maximum execution speed, while remaining portable across different machine architectures. To achieve that, FIRE/J transforms e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014